11/7/23
Q: was curious about the sensitivity/cut-offs/specificity, but we mainly discussed it in class already; was still slightly confused about it?
A: We’ll discuss this in detail later this week and early next week!
Q: Are Youden’s indices related to ROC curves?
A: Related, yes! We’ll discuss both soon!
Q: I wasn’t sure how to interpret some of the visuals towards the end of lecture.
A: That’s OK! We’ll be recreating these and discussing them more as we do this case study in class.
Q: Did they actually use intravenous blood draws or just thumb pricks because multiple intravenous would not be fun.
A: It was venous blood from the arm. This unfunness is one of the reasons participants were compensated.
Q: Did this study (or other studies on THC) impairment end up influencing any legislation at the local or state level?
A: Great question! The state is currently reviewing these and other study’s data. The state was definitely aware of this study and waited (im)patiently while we analyzed and worked to publish.
Q: How long should our reports be?
A: It’s hard to say. We’ll discuss an example today so you have a sense!
Q: Regarding the final project, will there be a Google form that we can fill out that will help us form groups if we can’t form one ourselves? A: Yup - I’d say try to find a group using Piazza, in class, or during lab. However, if you’re unable, when you fill out the form to indicate your group next week, you’ll select that you’d like to be placed into a group.
Due Dates:
Notes:
Important
The CS01 data are data for you only. My collaborator is excited that y’all will be working on this…but these are still research data, so please do not share with others or post publicly.
See & Discuss: https://cogs137.github.io/website/content/cs/cs-example.html
Feedback to other students here
Common comments:
This is NOT the rubric for your case study, but it will be similar:
Two additional packages required for these notes:
Three matrices:
Variables:
ID | participants identifierTreatment | placebo, 5.90%, 13.40%Group | Occasional user, Frequent userTimepoint | indicator of which point in the timeline participant’s collection occurredtime.from.start | number of minutes from consumptionReading in the .RData we wrote at the end of the last set of notes…(using load)
This reads the objects stored in these files into your Environment for use.
For example…we discussed this function in the last set of notes.
We’re going to have a lot of functions throughout…like this helper function to clean up names
Functions can/should be stored in a separate .R file, probably in a src/ directory.
For a single compound…
But, with wide data, that’s not easy to do for all compounds, so you may want to pivot those data….
Distribtions across all compounds (WB):
We start to get a sense of the data with these quick and dirty plots, but we’re really only scratching the surface of what’s going on in these data.
These data require a lot of exploration due to the number of compounds, multiple matrices, and data over time aspects.
compound_scatterplot_group <- function(dataset, compound, timepoints){
if(max(dataset[,compound],na.rm=TRUE)==0){
print(
dataset %>%
filter(!is.na(time_from_start)) %>%
ggplot(aes_string(x="time_from_start",
y=compound,
color="group")) +
geom_point() +
geom_vline(data=timepoints, aes(xintercept=as.numeric(stop)),
linetype="dashed",
color="gray28") +
scale_color_manual(values=c("#19831C", "#A27FC9")) +
scale_y_continuous(limits=c(0,3)) +
theme_classic() +
theme(legend.position="bottom",
legend.title=element_blank()) +
labs(x='Time From Start (min)',
y=gsub('GLUC', 'gluc',gsub("_", "-", toupper(compound))))
)}else{
print(
dataset %>%
filter(!is.na(time_from_start)) %>%
ggplot(aes_string(x="time_from_start",
y=compound,
color="group")) +
geom_point() +
geom_vline(data=timepoints, aes(xintercept=as.numeric(stop)),
linetype="dashed",
color="gray28") +
scale_color_manual(values=c("#19831C", "#A27FC9")) +
theme_classic() +
theme(legend.position="bottom",
legend.title=element_blank()) +
labs(x='Time From Start (min)',
y=gsub('GLUC', 'gluc', gsub("_", "-", toupper(compound))))
)
}
}❓ Why is there no pairplot for Breath?
compound_boxplot_group_only <- function(dataset, compounds, tissue, legend=TRUE, y_lab=TRUE){
timepoint_to_use = levels(dataset$timepoint_use)[1]
df <- dataset %>%
filter(timepoint_use == timepoint_to_use) %>%
select(group, all_of(compounds))
df <- df %>%
gather(compound, value, -group) %>%
clean_gluc() %>%
group_by(compound) %>%
mutate(y_max = 1.2*max(value)) %>%
group_by(compound, group) %>%
mutate(n = n(),
my_label = paste0(group, ' N=', n),
my_label = gsub(" ", "\n", my_label))
if(tissue == "Blood"){
df$compound = factor(df$compound, levels=c("THC","11-OH-THC","THCCOOH","THCCOOH-gluc"))
}
y_pos <- df %>%
group_by(compound) %>%
summarize(y.position = mean(y_max))
stat.test <- df %>%
group_by(compound) %>%
t_test(value ~ my_label) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance()
test <- stat.test %>%
left_join(y_pos) %>%
mutate(p.adj.signif = ifelse(p.adj.signif=='?', 'ns', p.adj.signif),
p.adj = ifelse(p.adj < 0.001, "<0.001", p.adj))
if(legend){
leg_position = 'right'
}else{
leg_position = 'none'
}
if(y_lab){
y_text = "Concentration (ng/mL)"
}else{
y_text = ''
}
medianFunction <- function(x){
return(data.frame(y=round(median(x),1),label=round(median(x,na.rm=T),1)))}
p2 <- ggplot(df, aes(x=my_label, y=value, fill=my_label)) +
geom_jitter(position=position_jitter(width=.3, height=0), size = 0.8, color="gray65") +
geom_boxplot(outlier.shape=NA, alpha=0.6) +
stat_summary(fun = "median", geom = "point", shape = 19, size = 3, fill = "black") +
stat_summary(fun.data = medianFunction, geom ="text", color = "black", size = 3.5, vjust = -0.65) +
facet_wrap(~compound, scales="free_y", ncol=4) +
geom_blank(aes(y = y_max)) +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.1))) +
# scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
scale_fill_manual(values=c("#19831C", "#A27FC9")) +
theme_classic() +
theme(text = element_text(size=14),
legend.position=leg_position,
legend.title=element_blank(),
panel.grid = element_blank(),
strip.background = element_blank()) +
labs(title = tissue,
x = '',
y = y_text)
ann_text <- test %>%
select(compound, p.adj, value = y.position, my_label=group1) %>%
filter(p.adj < 0.05) %>%
mutate(x1 = 1, x2 = 2)
if(tissue == "Whole Blood"){
ann_text$compound = factor(ann_text$compound,
levels=c("THC","11-OH-THC","THCCOOH","THCCOOH-gluc"))
}
p2 + geom_text(data = ann_text, label = ann_text$p.adj, nudge_x = 0.5) +
geom_segment(data = ann_text, aes(x = x1, xend = x2,
y = value - (0.04 * value), yend = value - (0.04*value)))
}compound_boxplot_treatment <- function(dataset, compounds, tissue){
timepoint_to_use=levels(dataset$timepoint_use)[2]
df <- dataset %>%
filter(timepoint_use == timepoint_to_use) %>%
select(treatment, group, compounds)
df <- df %>%
gather(compound, value, -treatment, -group) %>%
clean_gluc()
df %>%
ggplot(aes(x=treatment, y=value, fill=group)) +
# geom_jitter(color="gray36") +
geom_boxplot(outlier.size=0.1) +
scale_fill_manual(values=c("#19831C", "#A27FC9")) +
facet_wrap(~compound, scales="free_y", ncol=4) +
scale_x_discrete(labels=function(x) str_wrap(x, width=11)) +
theme_classic(base_size=10) +
theme(legend.position="bottom",
legend.title=element_blank(),
panel.grid=element_blank(),
strip.background=element_blank()) +
labs(title=tissue,
x="Treatment",
y="Measurement (ng/mL)")
}T2A_plot <- function(dataset, compound, timepoint_use=2){
timepoint_to_use=levels(factor(dataset$timepoint_use))[timepoint_use]
if(max(dataset[,compound],na.rm=TRUE)==0){
print(
ggplot(subset(dataset, timepoint_use==timepoint_to_use),
aes_string(x="group",
y=compound,
fill="group")) +
geom_boxplot(outlier.size=0.1) +
scale_fill_manual(values=c("#19831C", "#A27FC9")) +
scale_x_discrete(labels=function(x) str_wrap(x, width=10)) +
scale_y_continuous(limits=c(0,3)) +
facet_grid(~treatment) +
theme_classic() +
theme(legend.position="none",
panel.grid=element_blank(),
strip.background=element_blank(),
plot.title.position="plot") +
labs(title=paste0('Timepoint: ',
levels(dataset$timepoint_use)[timepoint_use],
' post-smoking'),
x='Group',
y=gsub('GLUC', 'gluc',gsub("_", "-", toupper(compound))))
)}else{
print(
ggplot(subset(dataset, timepoint_use==timepoint_to_use),
aes_string(x="group",
y=compound,
fill="group")) +
geom_boxplot(outlier.shape=NA, alpha=0.8) +
scale_fill_manual(values=c("#19831C", "#A27FC9")) +
scale_x_discrete(labels=function(x) str_wrap(x, width=10)) +
facet_grid(~treatment) +
theme_classic() +
theme(legend.position="none",
panel.grid=element_blank(),
strip.background=element_blank(),
plot.title.position="plot") +
labs(title=paste0('Timepoint: ',
levels(dataset$timepoint_use)[timepoint_use],
' post-smoking'),
x='Group',
y=gsub('GLUC', 'gluc',gsub("_","-",toupper(compound))))
)
}
}